J 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



0 Publication number: 



0 357 188 

A2 



® 



EUROPEAN PATENT APPLICATION 



® Application number: 89306583.9 
® Date of filing: 28.06.89 



® Int. CI.S: G06F 9/38 _ ^ , 

BEST AVAILABLE COPY 



The title of the invention has been amended 


© Applicant: INTERNATIONAL COMPUTERS 


(Guidelines for Examination in the EPO, A-HI, 


LIMITED 


7.3). 




Putney, London, SW15 ISW(GB) 


@ Priority: 27.07.88 GB 8817912 






@ Inventor: Duxbury, Colin Martin ' 


@ Date of publication of application: 


55, Poleacre Lane 


07.03.90 Bulletin 90/10 


Woodley Stockport. SK6 IPH(GB) 




Inventor: Eaton, John Riciiard 


® Designated Contracting States: 


52. Victory Road 


DE FR GB IT NL 


Salford Lancasfiire M6 8EY(GB) 




Inventor: Rose, Philip Vivian 




45 Meade Hilt Road 




l^lgher Crumpsall Manchester M8 6LT(GB) 




0 Representative: Guyatt, Derek Charles 




STC Patents West Road 




Harlow Essex CM20 2SH(GB) 



® Pipelined processor. 



® In a pipelined data processor, when a depen- 
dency is detected between a first instruction and a 
second, subsequent instruction, the second instruc- 
tion is abandoned. A look-ahead mode of operation 
is then initiated, in which instructions subsequent to 
the abandoned instruction are allowed to continue to 
be executed so as to pre-fetch operands, but are not 
allowed to be fully executed. The processor has two 
separate streams of instructions, each of which 
streams can be independently put into look-ahead 
' mode. When one stream is in look-ahead mode, the 



S other is given priority, 
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DATA PROCESSING APPARATUS 



This invention relates to data processing ap- 
paratus of the kind having a series of stages which 
execute successive instructions in an overlapped 
manner. Such apparatus is usually referred to as a 
pipelined processor. 

One problem that arises in apparatus of this 
kind is that the overall speed of operation is re- 
duced by dependencies between successive 
instructions. For example, an instruction may be 
held up because it requires to read data from a 
register that has not yet been written by an earlier 
instruction. 

One object of the present invention Is to pro- 
vide a novel pipelined processor, in which the 
problem of dependencies is reduced. 



Summary of the Invention 

According to the invention, there is provided 
data processing apparatus comprising: 

(a) a plurality of pipeline stages for executing 
a sequence of instructions in a pipelined manner, 

(b) means operative upon detection of a de- 
pendency between a first instruction and a second, 
subsequent instruction, for causing the second in- 
struction to be abandoned, and for initiating a look- 
ahead mode of operation, 

(c) means operative in the look-ahead mode, 
for. allowing instructions subsequent to the aban- 
doned instruction to continue to be executed so as 
to prefetch any operands for those instructions, but 
not to be fully executed, and 

(d) means for tenminating the look-ahead 
mode and for re-starting execution at the aban- 
doned instruction. 



Brief Description of the Drawings 



One processing apparatus in accordance with 
the invention will now be described by way of 
example with reference to the accompanying draw- 
ings. 

Figure 1 Is an overall diagram of the appara- 
tus. 

Rgure 2 shows an upper pipeline unit in 
more detail. 

Rgure 3 shows a fast data slave store in 
more detail. 

Rgure 4 shows a lower pipeline unit in more 

detail. 

Figure 5 shows slot pointers for controlling 
the flow of instructions through the pipeline units. 
Figure 6 to 8 show control logic for control- 



ling the Initiation of instructions in the upper pipe- 
line. 

Rgure 9 shows look-ahead mode control 

logic. 

5 Figure 10 and 11 show control logic for 

controlling the initiation of instructions In the lower 
pipeiine. 



70 Description of an embodiment of the Invention. 



Overall description of System 

15 Referring to Figure 1, the data processing ap- 
paratus comprises a series of pipeline units as 
follows: 

an instruction scheduler 10, an upper pipeline unit 
11, a fast data slave store 12, and a lower pipeline 
20 unit 13. 

The pipeline units 10-13 are interconnected by 
parameter files as follows: 

an instruction parameter file IPF, an address pa- 
rameter file APF. and an operand parameter file 

25 OPF. These allow instruction parameters to be 
passed between the pipeline units. 

The scheduler 10 has a fast code slave 14 
associated with it, for holding copies of instructions 
for access by the scheduler. 

30 The system also includes a main store 15 of 
larger size but slower access speed than the slave 
stores. 12. 14. and a slow slave store 16 of size 
and speed intermediate between those of the main 
store and the fast slaves. The fast slaves, the slow 

35 slave, and the main store form a three-level storage 
hierarchy. 

The scheduler 10 comprises two scheduler 
units IDA and 10B, for scheduling two separate 
streams of instructions, referred to as stream A and 

40 stream B. Stream A is dedicated to the main pro- 
cessing workload of the system. Stream B handles 
events that are independent of this main process- 
ing workload, such as managing input/output activ- 
ity, and communication with other processors. The 

45 provision of two independent streams allows more 
effective use of the hardware of the system. For 
example, as will be shown, when one stream is 
held up for some reason, the other stream can 
continue processing, so that the hardware is not 

50 idle. 

Each of the scheduler units 10A, 108 gen- 
erates a sequence of instruction addresses, for 
retrieving instructions from the code slave 14. If the 
required instruction is not in the code slave, It is 
retrieved from the slow slave 16 or from the main 
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store 15. The retrieved instructions are written into 
the IPF. Each instruction is acconnpanied by a 
program counter value PC which is also written into 
a portion of the IPF referred to as IPF.pc. The IPF 
has dual ports, so that the scheduler units 10A. 
10B can load the IPF simultaneously. 

Each of the parameter files IPF, APF and OPF 
(as well as another parameter file TPF to be de- 
scribed later) comprises sixteen registers, and can 
therefore hold parameters for up to sixteen different 
instructions at various stages of execution. The set 
of registers relating to a particular instruction is 
referred to as a slot: that is, each slot comprises a 
corresponding register from each of the register 
files. 

Ten of the slots are allocated to stream A and 
six to stream B. 

When an instruction is initially entered into the 
IPF from the scheduler, it is assigned a slot i.e. It is 
assigned a register in IPF and a corresponding 
register in each of the other parameter files. The 
instruction then retains this slot until it has been 
successfully executed by all stages of each pipe- 
line unit, whereupon the slot is released so that it is 
available for another instruction from the scheduler. 
As an Instruction passes down the pipeline, the slot 
number assigned to that instruction is passed down 
the pipeline with it. so that at each pipeline stage 
the appropriate register in the parameter Hie can be 
accessed. 

The upper pipeline 11 reads instructions from 
the IPF and processes them, so as to calculate the 
address of the required operand for the instruction. 
This may, for example, involve adding a displace- 
ment value to a base address held in an . internal 
register, such as a local name base register. Alter- 
natively, the address may be a literal value held in 
the instruction. The operand address Is placed in 
the APF in the slot appropriate to the instruction in 
question. 

The data slave 12, when it is free, reads an 
address from the APF and retrieves the required 
operand, if it is present in the data slave, or alter- 
natively initiates fetching of the operand from the 
slow slave or the main store. The retrieved operand 
is placed in the OPF in the slot appropriate to the 
instruction In question. Additionally, data from the 
slave may be returned to the upper pipeline so as 
to update one of the internal registers in that unit. 

The lower pipeline 13 reads the operand from 
the OPF and performs the required operation on it 
as specified by the instruction. For example, this 
may involve adding the operand to the contents of 
an accumulator register. 

Upper Pipeline 



Referring now to Figure 2, this shows the up- 
per pipeline unit 1 1 in more detail. 

The upper pipeline unit includes five pipeline 
stages referred to as UPO - LIP4. 
5 The first stage UPO contains logic, to be de- 

scribed below, for selecting a slot from the IPF, so 
as to initiate processing of the instruction in that 
slot. 

Normally, instructions in each stream are start- 

10 ed in the upper pipeline in chronological order. 
Also normally, stream A is given priority over 
stream B, so that a B-stream instruction is started 
only if there are no A-stream instructions available 
in IPF. However, stream B may be given priority in 

IS certain circumstances. 

After an instruction has been started, the upper 
pipeline may detect that the instruction cannot be 
successfully completed yet, because of a depen- 
dency on an earlier instruction. In this case, the 

20 instruction is abandoned. However instructions fol- 
lowing the abandoned instruction are allowed to 
continue running in a special mode called look- 
ahead mode, the purpose of which is to allow 
operands for the instruction to be prefetched, if 

25 necessary, into the fast data slave. Such look 
aheads are allowed only if they do not generate 
any further dependencies. The look-ahead mode 
can be Initiated for streams A and B Independently. 
When stream A is in look-ahead mode but not 

30 stream B. then stream B is given priority. When the 
dependency has been resolved, the stream is re- 
turned to normal non-look-ahead mode, and the 
abandoned instruction is restarted in the upper 
pipeline. 

35 UP1 comprises a decoder 20 which decodes 

the instruction from the selected slot to generate 
control signals for UP2, these being stored in a 
pipeline register 21 . The decoder 20 also produces 
an output signal which Is passed to UP2 for further 

40 decoding, by way of a register 23. 

UP2 contains a set of registers 24 which repre- 
sent local copies of the registers specified by the 
instruction set of the system. The definitive copies 
of these registers are actually in the lower pipeline. 

45 UP2 also contains a multiplexing circuit 25. 
which is controlled by the value in register 21 , and 
which selects input data for the registers 24 from 
one of the following sources: 

(a) An output data signal V from the lower 
50 pipeline. 

(b) A corrected register value UP.CORR 
from the lower pipeline. 

(c) A data signal RD from the data slave. 
UP2 also contains a decoder 26 which further 

55 decodes the contents of register 23 to produce a 
set of control signals for UP3, these being stored in 
a register 27. The decoder also produces a func- 
tion code F which is stored in a register 28. 
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UP3 comprises a multiplexing circuit 29 which 
selects input data for a set of arithmetic registers 

21 0, under control of the value in register 27. The 
input data is selected from the following sources: 

(a) A literal value N, which is obtained from 
the decoder 20 in UP1 by way of pipeline registers 

211. 212. 

(b) A program counter value PC, which is 
obtained from IPF.pc by way of pipeline registers 
213, 214, and 217. 

(c) The registers 24. 

UPS also contains a register 215 which passes 
the function code F to UP4. The function code F is 
also passed from UP3 to the parameter files APF 
and OPF where it is stored in the appropriate slot 
The portions of these parameter files which store 
the function code are referred to as APF.F and 
OPF.F. 

UP4 contains an arithmetic and logic unit (ALU) 
216, which performs an operation on the contents 
of the registers 210 under control of the function 
code F In the register 215. The result of the opera- 
tion is passed to the APF. where it is written into 
the appropriate slot. 

In a further stage UP5 {not shown), the address 
generated In UP4 may be checlced for architectural 
validity. Any en-or will cause a later unsuccessful 
termination of this slot. 



Data Slave 

Referring now to Figure 3. this shows the fast 
data slave 12 In more detail. 

The data slave includes five pipeline stages 
DSC - DS4. 

The first stage DSO comprises a priority logic 
circuit for selecting the next slot from the APF to 
be handled by the data slave. 

DS1 contains a decoder 30 which decodes the 
address selected from APF. to produce a byte shift 
value indicating the alignment of the required data 
item within a 32-byte block. The address and the 
byte shift value are passed to DS2 by way of 
registers 31 and 32. 

DS2 comprises a contents - addressable mem- 
ory (CAM) 33, which holds the addresses of data 
items currently in the data slave. The CAM 33 
receives the operand address from DS1, and com- 
pares It with all the addresses held in the CAM. If 
there is a match, the CAM produces a signal VHIT. 
and at the same time outputs a tag value indicating 
which 32-byte block of the data slave the required 
item is held in. The tag value is passed to DS3 by 
way of a register 34. 

If the required data item is not in the data slave 
(VHIT false), the data slave triggers an access to 
the slow slave, which will cause the data to be 



fetched, either from the slow slave or the main 
store, and loaded Into the data slave. 

The DS2 also includes a byte alignment circuit 
35 which receives a data item W from the lower 

5 pipeline and aligns it according to the byte shift 
value held in the register 32. The aligned data item 
is stored in a register 36. DS2 also includes regis- 
ters 37 and 38 which receive data items returned to 
the data slave from the main store and slow slave 

10 respectively. 

DS3 comprises a random-access memory 
(RAM) 39 which holds a number of individually 
addressable 32-byte blocks of data. The RAM Is 
addressed by the tag value from register 34. Data 

15 can be written into the RAM. by way of a mul- 
tiplexer 310, from any of the registers 36, 37 and 
38. Alternatively, a block of data can be read out of 
the RAM and passed to DS4 by way of a register 
311. 

20 DS4 comprises a byte alignment circuit 312. 
This Is controlled by a byte shift value, received 
from the decoder 40 by way of pipeline registers 
32, 313 and 314. The alignment circuit 312 selects 
the required data item from the block held in regis- 

25 ter 311, and passes it to the OPF, The selected 
data item is also supplied to the upper and lower 
pipelines as data signal RD. 



30 Lower Pipeline 

Referring now to Rgure 4, this shows the lower 
pipeline 13 in more detail. 

The lower pipeline includes four pipeline 
35 stages LPO - LP3 

The first stage LPO contains a priority logic, to 
be described, for selecting the next slot from OPF 
to be handled by the lower pipeline. 

Within each stream, instructions are started. 
40 (and hence finish) in strict chronological order, 
starting with the eldest. 

The execution of an Instruction in the lower 
pipeline may be started as soon as it is certain that 
the operand for it will be available from the data 
45 slave in time for use at stage LP2 of the lower 
pipeline. Thus, an instruction may be started in the 
lower pipeline while the data item is actually being 
retrieved from the data slave. In the limiting case, 
the data signal RD from the data slave is used 
50 directly in LP1. bypassing OPF. Hence, operations 
of the pipeline stages of the data slave and the 
lower pipeline can be overlapped. This is impor- 
tant, since it reduces the total transit time of 
instructions through the overall pipeline. 
55 LP1 comprises a control store 40 having an 

address input which receives the function code F 
from the selected slot of OPF.f. by way of a mul- 
tiplexer 41. The output of the control store 40 
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comprises a control code, and a next address 

value. The control code is passed to LP2 by way of 
a register 42. The next address value is fed back to 
a register 43 in LPO. and can then be selected by 
the multiplexer 41 so as to address another loca- 
tion in the control store 40. Thus it can be seen 
that, for each function code F. the control store 40 
can produce a sequence of control codes for the 
lower pipeline. At the end of the sequence of 
control codes, the control store produces an end of 
sequence signal, which tells the priority logic in 
stage LPO to select a new slot, and switches the 
multiplexer 41 so as to select the next function 
code from OPF.f. thus initiating a new sequence. 

LP1 also includes a set of arithmetic registers 
44. Data can be loaded into these registers by way 
of a multiplexing circuit 45 from either the following 
sources; 

(a) The operand held in the currently se- 
lected slot of the OPF. 

(b) Data from the registers in LP3 

(c) Slave data RD directly from the data 
slave (OPF bypass) 

Multiplexer 45 can also talce bypass data from 
registers 47 or ALU 46, where that data has not yet 
been written into the required register 41 1 . 

LP2 comprises an arithmetic and logic unit 
(ALU) 46 which performs an operation on the con- 
tents of registers 44 as specified by the control 
held in register 42. The result of this operation is 
passed to LPS by way of registers 47 and 48. 

LP2 also contains another parameter file, re- 
ferred to as the termination parameter file TPF. 
Like the other parameter files. TPF comprises six- 
teen registers, one for each slot. Whenever an 
instruction detects a problem in execution in any of 
the pipeline units, an indicator Is set in the slot of 
TPF corresponding to that instruction. 

LPS includes a condition logic circuit 41 2 which 
receives the output of the register 48 and a control 
signal from the control store 40 by way of registers 
42 and 410. The circuit 412 performs tests to 
detect whether a jump condition specified by the 
instruction has been satisfied, e.g. whether an ac- 
cumulator register is zero. If the jump condition is 
satisfied, the circuit 412 produces a jump signal 
JCON for the scheduler 10. 

LPS contains finish logic 49, which receives a 
control signal from the condition logic 412 to in- 
dicate whether any problems have been detected 
during execution in LP2. The logic 49 examines the 
signal and* the contents of TPF corresponding to 
the instruction, and determines whether this in- 
struction has encountered any problems during ex- 
ecution. When successful completion is detected, 
the circuit produces one of two signals AFinOK or 
BRnOK, depending on which stream tiie instruction 
Is in. The slot allocated to the instruction Is tiien 



released for re-use. 

LPS also includes a set of registers 41 1 which 
represents the definitive copies of the registers 
specified by the instruction set of the system. 

5 These registers receive data from the register 47. 
The registers 41 1 are loaded from register 47 only 
when the finish logic 49 indicates that this slot is 
completing successfully. In this way, the current 
process state, as defined in registers 411, is not 

70 corrupted by any errors in the execution of an 
instruction. The output of the registers 411 pro- 
vides the confection signal UP.CORR which can be 
fed baci< to the upper pipeline if required, to cor- 
rect the local copies of the registers held there. 

75 The output of the register 47 also provides the 

signal W to the data slave, and the signal V to the 
upper pipeline. 

20 Slot pointers . 

Referring now to Figure 5. the flow of A-stream 
instructions through the pipeline is controlled by a 
plurality of counters 50-54. A similar set of coun- 

25 ters (not shown) is provided for the B-stream. 

Counter 50 produces a signal ALdSlt which 
indicates the next slot in the IPF to be loaded by 
an A-stream instruction from the scheduler. The 
counter 50 is incremented by a signal APiLd when- 

30 ever an A-stream insti'uction is loaded Into the IPF. 
Thus the slots in the IPF are allocated sequentially 
to successive instructions. 

Counter 51 produces a signal AUStSIt which 
indicates the next A-stream slot to be started in the 

35 upper pipeline UP in normal mode (i.e. non-look- 
ahead mode). Similarly, counter 52 produces a 
signal AULaStSlt which indicates the next A-stream 
slot to be started in the upper pipeline in look- 
ahead mode. 

40 Initially, the contents of both the counters 51, 

52 are equal, this condition being detected by an 
equivalence gate 55. 

In normal mode, a look-ahead signal ALAmode 
is false. Each time an A-stream instruction is start- 

45 ed in the upper pipeline, a signal UPAStart goes 
true. This enables an AND gate 56 , which incre- 
ments the counter 51. Thus, in normal mode, the 
counter 51 selects A-stream instructions from suc- 
cessive slots of the IPF. so as to initiate processing 

50 of tiiese instructions in the upper pipeline. At the 
same time, an AND gate 57 and an OR gate 58 are 
enabled, which increments tiie counter 52. Thus, 
while tiie system remains in normal mode, the two 
counters 51. 52 are incremented in step with each 

55 other. 

In look-ahead mode, ALAmode is tme. and so 
the AND gate 56 is disabled, which prevents the 
counter 51 from being incremented. Each time a 
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new A-stream instruction is started in look-ahead 
nnode, an AND gate 59 is enabled, and this Incre- 
ments the counter 52. Thus, In look-ahead mode, 
the counter 52 continues counting, so as to con- 
tinue to select successive instructions from the IPF 
for starting in the upper pipeline. The counter 51, 
on the other hand, retains the slot number of the 
instruction which was abandoned. 

When the system returns to normal mode, AL- 
Amode goes false again. Thus, each time a new 
instruction Is started In the upper pipeline, the AND 
gate 56 is enabled, and the counter 51 Is incre- 
mented. However, while the contents of the coun- 
ters are unequal, the AND gate 57 remains inhib- 
ited and this prevents the counter 52 from being 
incremented. When the counter 51 eventually 
catches up with counter 52, both counters will 
again be Incremented in step with each other. If 
another look ahead is initiated before the counter 
51 has caught up. those Instructions that have 
already run in look-ahead mode will not do so 
again, since look-ahead starts from the current val- 
ue of counter 52. 

Counter 53 produces a signal ALStSIt which 
indicates the next A-stream slot to be started in the 
lower pipeline LP. This counter 53 is Incremented 
by a signal LPAStart whenever an A-stream In- 
struction is started in the lower pipeline. Thus, the 
instructions are started sequentially in the lower' 
pipeline. 

The counter 54 produces a signal AEIdSIt 
which indicates the slot holding the eldest A-stream 
instruction currently in the pipeline. Instructions are 
completed in strictly chronological order, and 
hence this signal indicates the next A-stream in- 
struction which is due to finish execution in the 
lower pipeline. The counter 54 is incremented by a 
signal AFinOK which indicates that an A-stream 
instruction has successfully completed execution in 
the lower pipeline. This releases the slot. 

The contents of the counters 50 and 51 are 
compared, to produce a signal AUAIISt, which in- 
dicates that all the A-stream instructions currently 
in the IPF have now been started in normal mode. 
Similarly, the contents of the counters 50 and 52 
are compared, to produce a signal AULaAIISt, 
which indicates that all the A-stream instructions 
currently in the IPF have now been started, with 
possibly some of them In look-ahead mode. 

The contents of the counters 50 and 54 are 
compared to produce a signal AlpfFull which In- 
dicates that all the A-stream slots in the IPF are 
now full. This prevents any further A-stream 
instructions from being loaded Into the IPF. 



Upper pipeline start controls. 



Figure 6 to 8 show the control logic for starting 
instructions in the upper pipeline. 

Referring toFigure 6, a multiplexer 60 selects 
either AUAIISt or AULaAIISt according to whether 
5 the signal ALAmode is false or true. The inverse of 
the output of the multiplexer 60 provides a signal 
AlpfRdy which indicates that there is at least one 
A-stream instruction in the IPF ready to be started 
in the upper pipeline. A similar signal BIpfRdy is 
10 produced for the B-stream. 

The signal AlpfRdy is fed to one input of an 
AND gate 61. which produces a signal UPAStart, 
for initiating an A-stream instruction in the upper 
pipeline. The other input of this AND gate receives 
15 the output of an OR gate 62, which receives the 
inverse of BIpfRdy and the inverse of a priority 
signal UPBStPrefd. 

Similarly. BIpfRdy is fed to one input of an 
AND gate 63. which produces a signal UPBStart. 
20 for initiating a B-stream instruction in the upper 
pipeline. The other Input of this AND gate receives 
the output of an OR gate 64, which receives UPB- 
StPrefd and the inverse AlpfRdy. 

Thus it can be seen that, when UPBStPrefd is 
25 false.* A-stream Instructions are initiated In pref- 
erence to B-stream instructions: a B-stream instruc- 
tion can be initiated only If there are no A-stream 
instructions ready. Conversely, when UPBStPrefd 
is true. B-stream instructions are initiated in pref- 
30 erence to A-stream instructions. 

Referring to Figure 7, a multiplexer 70 selects 
either AUStSIt or AULaStSlt according to whether 
ALAmode is false or true. The output of this mul- 
tiplexer therefore Indicates the slot number of the 
35 next A-stream instruction to be started in the upper 
pipeline, in normal or look-ahead mode as the case 
may be. A similar multiplexer 71 is provided for the 
B-stream. 

The outputs of the multiplexers are gated by 
40 way of AND gates 72. 73 to an OR gate 74, the 
output of which provides a signal IpfRdSIt. This 
signal is used to address the IPF, for reading out 
the next instruction for starting in the upper pipe- 
line. 

45 The AND gates 72, 73 are controlled by signals 
UPAStart and UPBStart as shown, so as to select 
the slot number for the A stream or B stream as 
required. 

Referring now to Figure 8. this shows the logic 
50 for producing the signal UPBStPrefd which indi- 
cates that the B stream is preferred for starting in 
the upper pipeline. 

UPBStPrefd is derived from an OR gate 80 
which receives the output of two AND gates 81 and 
55 82. AND gate 81 receives the signal ALAmode and 
the inverse of the corresponding signal BLAmode 
for the B-stream. Gate 82 receives a signal UPBPri, 
and the output of an equivalence gate 83, which 
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combines ALAmode and BLAmode. 

Thus, it can be seen that if one of the two 
stream is in look-ahead mode, and the other is in 
normal mode, the signal UPBStPrefd gives pref- 
erence to the stream that is in normal mode. If, on 
the other hand, both streams are in the same 
mode, then preference is given to the A stream or 
the B stream according to whether UPBPri is false 
or true. 

The signal UPBPri is produced as follows. 

A counter 84 is loaded with a preset value 
USPlim whenever a B-stream instruction is started 
in the upper pipeline, as indicated by UPBStart. 
The counter is then decremented whenever an A- 
stream instruction is started, as Indicated by UP- 
AStart. When the count reaches zero an AND gate 
85 Is enabled, which in turn enables OR gate 87, 
so as to make UPBPri true. Thus, it can be seen 
that the A stream is normally given priority, but the 
B stream is given priority if a predetermined num- 
ber of A-stream instructions are started without any 
corresponding B-stream starts. The values of 
USPlim can be preset to a value such as to 
achieve a desired balance between the two 
streams. 

The OR gate 87 also receives a signal BUr- 
gent, which gives priority to the B-stream when 
pending input/output activity or Inter-processor 
communication has become critical, over-riding the 
effect of the counter 84. 



disables AND gate 91, making ALAmode false, and 
at the same time enables an AND gate 92, produc- 
ing a signal AReStRdy. 

Since ALAmode is now false, the next A 

6 stream instruction to be initiated will be the instruc- 
tion in the slot indicated by AUStSlt. In other 
words, the instruction that was abandoned because 
of the dependency will now be restarted. 

When the instruction is restarted, the signals 

10 UPAStart and ARestRdy enable an AND gate 93, 
producing a signal ARestarted, which resets the flip 
flop 90. 

Similar logic (not shown) exists for generating 
the signal BLAmode for the B stream. 
75 It should be noted that an abandoned instruc- 

tion may be restarted before the dependency has 
actually been resolved. For example, consider the 
case where an instruction has been abandoned 
because it requires to read data from a register that 
20 has not yet been written from the slave output data 
RD of an eariier instruction. In this case, the signal 
ADepWait will go false as soon as an access is 
initiated In the data slave to retrieve the required 
data. Thus, the restarted instruction will start run- 
25 ning in the upper pipeline in parallel with the ac- 
cess to the data slave by the earlier instruction. By 
the time the restarted instruction requires to read 
the data, it is likely that the data will have been 
accessed from the data slave, and so execution will 
30 proceed normally. However, if the required data is 
not in the data slave, the restarted instruction will 
be abandoned again, and look-ahead mode is re- 
activated. 



Lower pipeline start controls. 



Look-ahead control logic . 

As mentioned above, if it is detected that an as 
instruction cannot be completed yet because of a 
dependency on an earlier instruction, it is aban- 
doned, and a look-ahead mode is initiated. 

Referring to Rgure 9, this shows the logic for 
controlling the look ahead mode. 40 

If the instruction to be abandoned is in the A- 
stream, signals AAbnDep and ADepWait are pro- 
duced. AAbnDep sets a flip flop 90. The. output of 
this flip flop, along with ADepWait, then enables an 
AND gate 91, to produce the signal ALAmode. 45 
which puts the A stream into look-ahead mode. 

As described above, in look-ahead mode, 
instructions continue to be initiated, under control 
of the counter 52 (Figure 5). Thus, instructions 
following the abandoned instruction are allowed to so 
continue running, so as to prefetch operands, if 
necessary. However, these instructions are not al- 
lowed to terminate successfully, or to update the 
definitive copies of the register in the lower pipe- 
line. 55 

The signal ADepWait goes false again when it 
Is detected that the dependency has now been 
resolved, or is likely to be resolved shortly. This 



Figures 10 and 11 show the control logic for 
starting instructions in the lower pipeline LP. 

Referring to Figure 10, when the first stage LPO 
of the lower pipeline is available to start executing 
an instruction, a signal LPipeAv is produced. 

A signal ADsDone is produced, to indicate that 
the required data slave access (if any) for the A 
stream has been completed or is -expected to be 
completed shortly. This signal consists of ten bits, 
one for each a stream slot. These ten bits are 
applied to a multiplexer 100, which selects the bit 
corresponding to the next A-stream instruction to 
be started in the lower pipeline, as indicated by 
ALSlSIt. 

The signal LPipeAv, and the output of the 
multiplexer 100 are combined in an AND gate 101 
to produce a signal LPAStartPoss, which indicates 
that it is now possible to start a new A-stream 
instruction in the lower pipeline. 

Similar logic exists as shown to produce a 
corresponding signal LPBStartPoss for B stream. 
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The signal LPAStartPoss is applied to one In- 
put of an AND gate 102, which produces a signal 
LPAStart, indicating that an A-stream instruction is 
to be started in the lower pipeline. The other input 
of this AND gate receives the output of an OR gate 
103, the inputs of which receive the inverse of 
LPBStartPoss and the inverse of a priority control 
signal LPBPrl. 

Similarly, the signal LPBStartPoss is applied to 
one input of an AND gate 104, which produces a 
signal LPBStart, Indicating that a B-stream instruc- 
tion is to be started in the lower pipeline. The other 
input of this AND gate receives the output of an 
OR gate 105, the inputs of which receive the signal 
LPBPri and the inverse of LPAStartPoss. 

Thus, it can be seen that, if LPBPri Is false, 
then an A-stream instruction Is started whenever 
possible. In preference to a B-stream instruction. If 
LPBPri is true, then the B-stream instructions are 
given preference. 

The priority signal LPBPri is produced by a 
logic circuit similar to the circuit shown In Figure 8 
for producing UPBPri, In this case, the counter is 
controlled by signals LPAStart and LPBStart. and 
the preset count value is LSPLIm, which may be 
different from USPUm. 

It should be noted that the signals ADsDone 
and BDsDone can be produced as soon as the 
data slave knows it will be able to provide the 
requested data item without recourse to the slow 
slave or main store. In practice, the signals are 
produced at the same time as the data slave is 
being accessed. This means that the operation of 
the lower pipeline may be overlapped with the 
operation of the slave store, if production of ADs- 
Done or BDsDone results in an immediate LPAStrt 
or LPBStart for the same slot 

Referring now to Figure 11, this shows the 
logic for selecting the slot to be initiated in the 
lower pipeline. 

The signal ALStSIt. indicating the next A- 
stream slot to be started in the lower pipeline, is 
gated with LPAStart in a set of AND gates 110. 
Similarly, BLStSIt is gated with LPBStart in a set of 
AND gates 111. 

The outputs of gates 110 and 111 are com- 
bined in a set of OR gates 112, to produce a signal 
LPOSIt, which Indicates the slot number of the next 
instruction, in the A-stream or B-stream as the case 
may be. to be started in the lower pipeline, LPOSIt 
is used to access the parameter file OPF so as to 
read out the required instruction and operand for 
the lower pipeline. 

The signal LPOSIt is stored in a register 113 
when the instruction enters stage LP1 of the lower 
pipeline, to produce a signal LPISIt. This signal is 
gated back to the OR gates 1 12, by way of a set of 
AND gates 114, whenever LPipeAv is false. Thus, if 



the lower pipeline is not available to receive a new 
instruction (because It is executing a multi-beat 
instruction) the current slot number is maintained. 
It should be noted that, for each instruction, all 

5 changes to the process state are made at the same 
time, upon successful termination In state LP3. 
Thus, instructions can be considered atomic, being 
executed either in full or not at all. This simplifies 
control and recovery of the pipeline on jumps and 

w errors. 



Claims 

76 1 , Data processing apparatus comprising a plu- 
rality of pipeline stages (UPO, UP1- DSO, DS1- 
LPO, LP1— ) for executing a sequence of instruc- 
tions in a pipelined manner, 
characterised in that: 

20 (a) upon detection of a dependency between 

a first instruction and a second, subsequent in- 
struction, the second instruction is abandoned, and 
a look-ahead mode of operation is initiated, 

(b) in the look-ahead mode, instructions sub- 
25 sequent to the abandoned instruction are allowed 

to continue to be executed so as to prefetch any 
operands for those instructions, but are not fully 
executed, and 

(c) when the look-ahead mode Is terminated, 
30 execution is re-started at the abandoned instruc- 
tion, 

2. Apparatus according to claim 1 wherein the 
look-ahead mode is terminated when the depen- 
dency is about to be resolved. 

35 3. Apparatus according to claim 2 wherein the 

look-ahead mode is terminated when a data slave 
store access is initiated to retrieve data required for 
resolving the dependency, but before the data is 
actually available. 

40 4. Apparatus according to any preceding claim, 
including : 

(a) a first counter (51) for indicating the next 
instruction to be started in normal mode, 

(b) a second counter (52) for indicating the 
45 next instruction to be started in look-ahead mode, 

and 

(c) means (55-59) for incrementing both 
counters in step with each other while the counters 
are equal in normal mode, for incrementing only 

50 the second counter in look-ahead mode, and oper- 
ative on return to normal mode for incrementing 
only the first counter, until it is again equal to the 
second counter. 

5. Apparatus according to any preceding claim, 

55 including means (10) for initiating first and second 
independent streams of instructions, wherein each 
stream can be independently put into look-ahead 
mode. 
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6. Apparatus according to claim 5 wherein, 
when one stream Is in the look-ahead mode and 
the other is in the normal mode, the stream in the 
normal mode is given execution priority over the 
other stream. 5 
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® In a pipelined data processor, when a depen- 
dency is detected between a first instruction and a 
second, subsequent instruction, the second instruc- 
tion is abandoned. A look-ahead mode of operation 
is then initiated, in which instructions subsequent to 
the abandoned instruction are allowed to continue to 
be executed so as to pre-fetch operands, but are not 
allowed to be fully executed. The processor has two 
separate streams of instructions, each of which 
streams can be independently put into look-ahead 
mode. When one stream is in look-ahead mode, the 
other is given priority. 
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